1/4 



PREPARATION PHASE 



USER DEFINES THE FOLLOWING: 



WEB PAGE CONTENT TYPES 
THAT THE METHOD MUST 
RECOGNIZE 

^ 

N (COMPANY NEWS) 

C (CONTACT INFORMATION) 

P (PRODUCT INFORMATION) 

M (MANAGEMENT TEAM) 

D (COMPANY DESCRIPTION) 

...etc... 



SET OF TESTS THAT PROVIDE 
EVIDENCE ABOUT THE 
CONTENT TYPE 

. ^ 

T1 = "NUMBER OF EXTERNAL 

LINKS ON PAGE > 5" 

T2 = "NUMBER OF INTERNAL 

LINKS>10" 

T3 = "LINK TEXT CONTAINS 
CONTACT KEYWORDS 
(e.g. ADDRESS, LOCATION, 
CONTACT, etc)" 
T4 = "NUMBER OF PEOPLE 
NAMES IN PAGE > 3" 
T5 = "PAGE CONTAINS 
STOCK TICKER SYMBOL" 
T6 = "PAGE CONTINES 
HEADER STARTING 
WITH WORD "ABOUT.."" 
...etc... 



FIG. 1 



f 
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TRAINING PHASE 50 



23 

TRAINING SET OF WEB 
PAGES WITH KNOWN 
CONTENTS 



1 



1 



□ 



20 

CONTENT TYPES FOR 
EACH WEB PAGE IN 



THE TRAINING SET 


PAGE 


CONTENT TYPES 


1 


N, C, P 


2 


N, C 


3 


D, M 


4 


M, P,C 




..etc.... 



22 

TEST RESULTS FOR EACH 
WEB PAGE IN THE 
TRAINING SET 



PAGE 


T1 


T2 


T3 


T4 ^ 


1 


T 


F 


T 


F 


2 


F 


T 


F 


F 


3 


F 


F 


T 


T 


4 


F 


F 


T 


T 



CALCULATE 
STATISTICS 



27 



P(H=N) = 0.20 




P{H=C) = 


0.20 




etc. 


P(T1=T/H=N) = 
P(T1=F/H=N) = 


0.4630 
0.5370 


P(T1=T/H= 
P(T1=T/H= 


=C) = 
=C) = 


0.2344 
0.7656 




P(T2=T/H=N) = 
P(T2=F/H=N) = 


0.2647 
0.7353 


P(T2=T/H 
P(T2=T/H: 


=C) = 
=C) = 


0.6224 
0.3776 


etc. 


P(T3=T/H=N) = 
P(T3=F/H=N) = 


0.7352 
0.2648 


P(T3=T/H= 
P(T3=T/H: 


=C) = 
=C) = 


0.2432 
0.7568 





.etc. 



.etc. 
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CLASSIFICATION PHASE 



34 



52 



SUBJECT WEB PAGE 
(UNKNOWN CONTENT 



TYPE) 



STATISTICS FROM TRAINING PHASE 



TEST RESULTS FOR 
SUBJECT SITE 

/36 



15- 



T1=T 
T2=F 
T3=F 
T4=T 
T5=T 
...etc.. 



27 



P(H=N)=0.20 



P(H=C)=0.20 



.etc. 



P(T1 =T/H=N) = 0.4630 P(T1 =T/H=C) = 0.2344 
P(T1 =F/H=N) = 0.5370 P(T1 =T/H=C) = 0.7656 



P(T2=T/H=N) = 0.2647 
P(T2=F/H=N) = 0.7353 

P(T3=T/H=N) = 0.7352 
P(T3=F/H=N) = 0.2648 

etc 



P(T2=T/H=C) = 0.6224 
P(T2=T/H=C) = 0.3776 

P(T3=T/H=C) = 0.2432 
P(T3=T/H=C) = 0.7568 

....etc 



...etc. 




CONFIDENCE LEVELS FOR EACH 



BAYESIAN NETWORK 

(COMBINE TEST REULTS AND 

CALCULATE CONFIDENCE 
LEVEL FOR EACH CANDIDATE 
TYPE) 





CONTENT TYPE 


^32 




CONTENT 


CONF. 




TYPE 


LEVEL 


N 


(COMPANY NEWS) 


22% 


C 


(CONTACT INFORMATION) 


4% 


P 


(PRODUCT INFORMATION) 


89% 


M 


(MANAGEMENT TEAM) 


7% 


D 


(COMPANY DESCRIPTION) 


92% 


...etc... 
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PREFERRED EMBODIMENT 



INPUT 



> 




TRAINING 
MODULE 



50 



B AYES IAN 
NETWORK 
MODULE 



52 



59 



( OUTPUT 
V 16 




FIG. 4 



